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Objective: , . , ■ . r 

The objective of the Chief project is to provide an integrated simulation environment tor 

studying and evaluating various issues in designing parallel systems, including machine 

architectures, parallelizing compiler techniques, and parallel algorithms. 

The objective of the Delta project is to provide a facility to allow rapid prototyping of 
parallelizing compilers that can target toward different machine architectures. 
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Major Accomplishments 

I Developed a program instrumentation and simulation facility, MaxPar, that can measure 
the maximum inherent parallelism in application programs and also can measure the 
effectiveness of various parallelizing compiler techniques. 

2. Developed parallel simulation kernels on the Alliant FX/8 parallel computer based on a 
conservative (Chandy-Misra) and optimistic ('l ime Warp) event-driven models. 

3. Developed a parallel simulation kernel, PARS1M, on the Alliant FX/8 parallel computer, 
that employs a hybrid time- and event-driven model to speed up simulations. PARS1M will 
also run on uniprocessor machines such as high-performance workstations. 

I. Designed and implemented a high-level language CARL (Computer Architecture Research 
Language), which is based upon C and is used for writing simulators. 

3. Developed preprocessors to translate CARL into C and C++ code. The resulting code can 
be compiled with a standard compiler to allow the simulations to be carried out either on a 
workstation or on a parallel computer such as the Alliant FX/8. 

6. Developed a high-level graphical interface to assist in simulator configuration and to run 
suites of benchmark executions on the Chief simulators. 

7. Developed a bitmapped graphical interface for PARSIM, PARSIM-UI, that allows a user to 
display and control the state of the simulation. Its operation may be customized with an 
interpreted language to display simulation-specific information according to user 
preferences. 

8. Developed a data display tool that plots the results of simulation runs on a bitmapped 
workstation. 

9. Implemented two pilot parallel simulators on the Alliant FX/8. They can run a FORTRAN 
program suite through a parallelizing compiler to generate parallel traces. In one case, the 
resulting traces drive the simulation of a shared-memory multiprocessor system with a 
multistage shuffle-exchange network. In the other case, the traces drive the simulation of 
an eight-processor system similar to an FX/8 system. 
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I. Chief Project Overview 

Chief is a parallel simulation environment for studying parallel systems. Figure 1 shows its 
basic structure. 



Figure I — Chief Project Overview 


Parallel systems are studied by creating simulators and driving those simulators with 
benchmark programs. These benchmark programs are restructured according to the architecture 
of the target system, and parallel traces are created. 

A simulator for the target system is constructed from the architecture specification. The 
core of the simulator is a simulation kernel (based upon one of three paradigms). The simulator 
includes a powerful bitmapped window interface that provides the user with a complete view of 
and control over the execution. The user can vary a set of parameters to the simulated system. 
The simulator is driven by the parallel traces described above. 

Statistics are collected during simulation runs. The Chief environment provides tools to 
examine these statistics and plot their values against the simulation parameters. 

A separate tool, MaxPar, can be used to instrument programs to measure the maximum 
inherent parallelism within them. The results MaxPar generates are an upper bound on the 
available parallelism, and can be used to evaluate the effectiveness of the restructuring compilers 
and simulated system. 
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2. Simulafioii Facilities 


2.1. Execution-driven Simulation: MaxPar 

MaxPar is an execution-driven parallelism profiling and extraction facility. It instruments 
an application (such as a Perfect Club 1 benchmark) to collect statistics based upon the actual 
execution of the program. It can determine the inherent maximum parallelism of an application 
program and the optimal parallelism of the program with system constraints (such as the number 
of processors, storage-related data dependences, and the synchronization overhead). MaxPar can 
locate the bottlenecks in the program. Finally, MaxPar can generate parallel execution traces for 

the program. 

MaxPar instruments an application program to record timing and scheduling information 
for each data object, where a data object is either a scalar variable or an array element. To store 
this information, MaxPar associates additional variables, called shadow variables, with each data 
object. For each variable X, the read shadow trX records the last time X was read and the write 
shadow twX records the last time X was written. Given the operation 


C - A op B 


where C, A, and B can be scalar variables or array elements, and the op can be any arithmetic or 
logical operator, then the equations used to update the shadows are. 


m C = compute jime{op) + max(mC, trC, twA, twB ) 
trA = max(fM, twC) 
trB = max(/rB, twC) 


When a data object is read, its write shadow is checked to determine the earliest possible 
lime for the read operation to proceed. The read can proceed only after the previous write has 
completed. If the read and write are from different processors, the overhead resulting from data 
synchronization is computed. The read shadow is then updated to that time. When a data object 
is written, both its read shadow and its write shadow are checked to compute correct timing and 
to perform any necessary synchronization. 

MaxPar also takes other system features into consideration. The number of processors in 
the target system may be specified as a finite number or may be infinite. Parallelism may be 
measured at one of four levels of granularity: operation-level, statement-level, loop-level, or 
subprogram-level. MaxPar can also take into account scheduling schemes and the 
synchronization overhead for data synchronization and barrier synchronization. The anti- and 
output-dependences of a program can be eliminated by an optional dynamic storage allocation 
scheme. MaxPar can compute the amount of additional storage required to achieve this pure 
data-flow type of execution. 

MaxPar instruments the application program, producing a new source program. This is 
compiled on the host machine, linked with runtime libraries, and executed. The program 
produces computationally correct answers. In addition, it produces an execution profi e y 
counting the number of operations that can be executed al each time instance. A parallel trace 
can also be generated. Figure 2 shows the profile of a 512-point fast Fourier mmsfonn. The 
nine “peaks” represent the high parallelism present at the start of each phase of tie " . ie 
plot does not include the first part of the program, which performs initialization. The parallelism 
in this example is measured at the loop level with an unlimited number of processors and with 
no overhead due to scheduling and synchronization. 


Number of OPs executed 
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Figurc 2 — MaxPar’s Execution Profile of a 512-point FFT 


2.2. Parallel Simulalion Kernels 

A Chief simulation consists of a group of modules interconnected by nets. A module 
encapsulates some function, presenting it to the “outside world” through a set of inputs and 
outputs. The inputs and outputs of the modules within a system are connected together by nets. 
The inputs and outputs may he scalars or arrays (with a maximum of three dimensions), the size 
of which can be specified during runtime configuration. Modules may be implemented directly 
as a set of low-level functions that directly read values from input nets and write values to output 
nets. Alternatively, modules may be constructed from other modules. The simulation is 
implemented as a hierarchy of modules. The root of the module hierarchy is the simulation 
itself. Many common low-level modules will be provided in a simulation library. 

In order to reduce the time required to simulate large parallel systems, Chief provides three 
different parallel discrete event simulation (PDES) kernels. Simulators built with these kernels 
share a common user interface, and a single language is used to write code for all three 
simulation paradigms. 

The PDES kernels include a conservative approach (based upon the work of Chandy and 
Misra 2 ), an optimistic approach (based upon the Time Warp technique), and a approach that 
employs a hybrid of time-driven and event-driven techniques (called PARSIM, for parallel 
simulator). It is well known that the performance of these PDES approaches is problem- and 
application-dependent. By providing all three simulation kernels with a single user interface and 
simulation language, Chief gives users the ability to write one simulator specification and select 
one of the three approaches at compile time. All three approaches are currently implemented on 
an Alliant FX/8 system. In addition, PARSIM is also implemented on uniprocessor Sun 

Microsystems machines. 

The user describes each simulation component and the interconnection of components that 
I onus the system. The component definitions arc written in the language CARL (described in 
section 1) Two kinds of components can be defined: behavioral components and hierarchical 
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components Behavioral components are described by defining their local state, their inputs and 
ouputs, the actions that should be taken when one or more of their inputs change, and the 
initialization that should be performed when the simulation starts or is re-executed. Hierarchical 
components are described by defining the subcomponents that constitute them and the manner in 
which subcomponents are connected to one another and to the inputs and outputs ol the 
hierarchical component. 

A Chief simulator is constructed from a collection of these component definitions. The 
construction stage comprises two independent phases: the translation phase and the code 
veneration phase. During the translation phase, the component definitions are translated 1 into C 
structures (for PARS1M) or C++ classes (for Chandy-Misra and Time Warp) that define the 
various types of the participating components. The data members of each C structure or C++ 
class represent the state associated with the respective components in the simulation. 

For Chandy-Misra and Time Warp simulations, the function members of the C++ class 
constitute the set of routines needed to simulate the respective components. The system is 
represented as a collection of logical processes, each of which simulates a component and 
communicates with other components. Each logical process is the set of member functions 
ilc lined in its class. An important goal of the construction stage is to to minimize the 
communication overhead and maximize the potential parallelism in the execution of t e 
simulation. To achieve this goal, we partition the logical processes into sets and assign the 
simulation of each of these sets to a processor. This assignment is achieved by generating 
appropriate code to be executed by each physical piocess. 

For PARSIM simulations, the structure definitions are created in a header file and the 
executable routines that simulate the component are created in a separate code file. The code file 
is compiled along with the header files of its own component and any included subcomponents 
to create an executable module. A complete simulation consists of a linked set of of simulation 
modules. 

The execution of the simulation is the final stage of the simulation process. The Chandy- 
Misra and Time Warp paradigms are based on the exchange of messages to convey information 
from one component to another. Chandy-Misra also incorporates a means for avoiding 
deadlock The machine on which we are developing this tool (an Alliant) is a shared memory 
machine; therefore, instead of using actual messages we use shared memory to convey 
information and (in the Chandy-Misra case) to avoid deadlock. By doing so we reduce the cost 
associated with the use of messages. Each component, for which there are events to simulate is 
extracted from the ready queue maintained by each physical process, and is simulated on the 
outstanding events. When there are no more events to simulate it is blocked waiting for new 
events (messages) to arrive, and control is transferred to another ready logical process. This 
cycle is repeated until all components have been simulated up to a certain (virtual) time, which 
has been defined by the user as the End_of_the_simulation_time. 

PARSIM employs a combination of the time-driven and event-driven approaches to 
simulation. PARSIM maintains a system event queue that is a time-ordered list of event lists. 
Each sublist contains events that occur at the same simulation time. PARSIM also maintains 
event queues for each of the nets affected by clock-induced events. 

PARSIM executes events in groups. It dequeues the first list of events from the system 
event list. Then, in parallel, it evaluates these events, resulting in new values being assigned to 
nets. Each component that is affected by the change in the nets may specify an “action routine 
that updates that components status. PARSIM makes a list of all of the action routines that must 
be processed. After all of the nets have been updated, all of the action routines are evaluated in 
parallel, t hese routines may, in turn, post additional events to the global event queue. 
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3. CARL — Computer Architecture Research Language 

The Chief project provides three different paths by which simulators can be constructed, 
according to the PARSIM, Chandy-Misra, and Time Warp paradigms, respectively. Although 
the simulation techniques are different, in all three cases the simulated system is specified as a 
connected set of hierarchically-defined components. 

Components are written in a semi-abstract language called CARL. The use of this 
language frees the component designer from the need to know the low-level details of the 
various implementations. More importantly, component definitions written in CARL can be 
incorporated into any of the Chief simulators simply by using an appropriate preprocessor to 
convert CARL code to C (for PARSIM) or C++ (for Chandy-Misra and Time Warp). 

A component description in CARL consists of sections of C-like code headed by CARL 
keywords. The keywords are COMPTYPE, INPUTS, OUTPUTS, SUBCOMPONENTS, VAR, 
ACTION, INIT, STRUCTURE, BEGIN, and END. The COMPTYPE, INIT, STRUCTURE 
BEGIN-END sections contain executable statements modelling a component s behavior an 
specifying its internal structure. 

#def ine ADD 0 
#define SUB 1 
^define AND 2 
#def ine OR 3 

COMPTYPE Alul6 (speed) 
int speed; 

INPUTS 

short in [2] : alu_eval; 
char op: alu_eval; 

OUTPUTS 

short sum; 

VAR 

int Speed; 

ACTION alu_eval 
switch (op) { 
case ADD: 

sum = in[0] + in[l] after Speed; 
break ; * 
case SUB: 

sum = in[0] - in[l] after Speed; 

break; 
case AND: 

sum = in[0] & in[l] after Speed; 

break; 
case OR: 

sum = in[0] i in[l] after Speed; 
break; 

} 

BEGIN 

Speed = speed; 

END 

Figure 3 — CARL definition of a 16-bit ALU 

Figure 3 shows the CARL definition of a simple ALU, capable of performing four 
operations upon its two 16-bit inputs. The component, whose type is Alul6, has one 
parameter: the delay between a change to its inputs and a new value on its outputs: 
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The Chief project includes .wo preprocessors. PSPP, the PARSIM Preprocessor, converts 

CARL into C. PSPP is a compiled program that uses the tools lex and yacc to read CARL 

programs. It generates two files: a header file that defines PARSIM data structures and a C code 
file that contains module creation, connection, initialization, and action routines. The host 
machine’s C compiler will convert the code files into object modules that can be linked with the 
PARSIM runtime libraries and user interface to form a PARSIM simulator. 

C2CMTW, the CARL “2” Chandy-Misra/Time Warp preprocessor, converts CARL into 
C++. C2CMTW, like PSPP, is an executable program. It generates two files: a header file that 
defines C++ classes for each component type and a C++ code file that contains the definitions o 
the class member functions. The host machine’s C++ compiler will convert these files into 
object modules that can be linked with either the Chandy-Misra runtime libraries or the 
Time Warp libraries to create a simulator. 

4. UI — PARSIM User Interface 

The PARSIM user interface (PARSIM-UI, or simply UI) displays information in bitmapped 
windows using the XI 1 window system. It provides control facilities for starting, stopping, 
continuing, and breakpointing simulation runs. Nets can be viewed graphically. By creating 
several windows, the user can interact with the simulation from multiple contexts. 

The core of PARSIM-UI is an execution engine that parses and executes commands written 
in a simple language. The graphical interface “wrapper” accepts input in the form of menu 
selections, button presses, etc. and transforms it into commands that are interpreted by the 
engine. The user- interface language is also directly available, so that the user can customize his 

or her debugging sessions as necessary. 

PARSIM-UI can directly access objects in the simulation system: components, inputs, 
outputs, and nets. It also provides and operates upon simulator variables. Variables may contain 
integer,’ floating-point, or string values, or may contain one of three special typed values: error 
high-impedance, and unknown. Their type is dynamic — an assignment to the variable sets the 
type as well as the value. The value of an uninitialized variable is the integer zero. 

A set of operators combines components, nets, variables, and literal constants into more 
complex expressions. An expression may be used whenever the PARSIM user interface expects 
a value. In particular, an expression may be used within a component or net array subscript. 
Lunction calls may also appear within expressions. They are called using the syntax 

function name (args ) 

where function name is the function name and args is a comma- separated list of expressions that 
represent the arguments to the function. The number and type of arguments are function- 
specific. PARSIM-UI provides a set of standard built-in functions, which provide access to the 
simulation state. Users can define additional functions. 

The primary interface to PARSIM-UI is graphical; however, in recognition of the fact that 
text input is sometimes necessary, macros can be used to hid some of the programming-language 
appearance from the user. A set of built-in macros is provided The user may define any 
number of new macros and is free to redefine the built-in macros if he or she so desires. 

The PARSIM-UI language provides primitives for grouping, iteration (WHILE), 
conditionals (if), function definition, and macro definition. Hie syntax is vaguely similar to 

Algol or C 

PARSIM-UI provides a powerful breakpoint facility. Breakpoint conditions are expressed 
as an arithmetic expression and therefore may depend upon nets, constants, and variables. I h.s 
provides great flexibility; for instance, it is possible to check if the currently-addressed register 
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mcmory reference trace that can be used to drive multiprocessor simulators created by the 
Chief tools. 

5.3. Alliant FX/8 Traces 

MHant supplies an emulator for the FX/8. Programs are compiled with the parallel Alliant 
Fortran or C compiler. The resulting object modules arc linked with runtime routines to create 
executables. These are then emulated to produce Alliant-specific memory reference traces. The 
programs produce computationally correct results, and the traces arc a very realistic reflection o 
the program’s parallel behavior. However, because the traces are machine-specific, they canno 
be made to accomodate a memory hierarchy or a configuration consisting of more than eight 

processors. 

6. Data Visualization Tool 

The Chief visualization tool plots data for display on a bitmapped workstation The data is 
collected from a suite of simulation runs in which simulation parameters are varied from run to 
n,n The data from each run is stored into a file. A separate description file identifies all of the 
data items. The visualization tool reads the description file and all of the data files. The user can 
plot any data item against any simulation parameter while constraining the values of other 

simulation parameters. 

7. Top-Level Chief Environment 

All of the Chief tools are assembled into a top-level bitmapped environment. The 
environment guides the user through the creation of a simulator from a set of components stored 
in a component library. More than one version of some components may be archived, so the 
environment allows the user to view the current set of components and select the desired version 
for each one. Each component contains a set of parameters that control its behavior. I he 
environment extracts a complete list of parameters from the specified components and provides 
mouse -driven tools that allow the user to specify new parameter values. 

The environment provides a simple interface that allows the user to specify a set of 
compiler parameters, compile a benchmark, and generate a trace file. The editing of compiler 
parameters is similar to the editing of simulation parameters. In addition, the environment a so 
allows the user to invoke MaxPar to analyze the parallelism within the benchmark. 

When instructed to build a simulator, the environment will invoke the appropriate Chief 
preprocessor for each component definition (written in CARL), will invoke the system compiler 
to create object files for all components, and will link those object files with the appropriate 
kernel and user interface libraries. A simple command will execute the resulting simulator. 

The power of the Chief environment lies in its ability to execute a suite of compilation and 
simulation runs while varying the input parameters. The user specifies a set of values for eac 
parameter and the environment will automatically compile the benchmark to produce a race 
file build ’the simulator, and invoke the simulator with the trace file as input. The output from 
each simulation run will be written to a separate file. The user can then use the Chief 
visualization tool described above to display these results graphically. 
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8. Pilot Simulations 

Two simulators have been developed to demonstrate the utility of the Chief environment. 
Fust the simulations can be run in parallel, resulting in fast execution time. Furthermore, the 
simulations are written in CARL, which is extremely modular, allowing faster initial code 
development. This allows component models to be replaced much more easily than in dedicated 

simulation programs. 

8. 1 . Ccdar-like System Simulator 

A simulator has been developed to simulate the Cedar global memory system • . It consists 
of models for the Omega networks, the global memories, and a simple processor. The simulator 
is driven by traces of Fortran programs generated by Parafrase. It allows different system 
configurations to be simulated by changing the size of the system and the size and configuration 
of the network switches, and by replacing the switch and memory component models to test, 
c.g., different internal buffering configurations. 

The system model is a simplification of Cedar, in that processors are not clustered as in 
Cedar Furthermore, the current processor model does not simulate the effects of caching or 
cluster memory. Some of these effects can be accommodated by changing the way traces are 
gathered by Parafrase. More advanced processor models are also under development. 

8.2. Cedar Cluster Simulator 

A simulator has been developed to simulate a Cedar cluster. It consists of models for the 
caches, the cluster memory, and eight simple processors. Traces created by the Alliant FX/8 
emulator drive the simulation. 
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Project Summary 

The Delta Program Manipulation System * 


Gregory Jaxon David Padua Paul Petersen 
Center for Supercomputing Research and Development 
University of Illinois at Urbana-Champaign 

March 1, 1991 


Abstract 

This report summarizes the status of the Delta Program Manipulation System [Pad89] 
project at the expiration of its initial project development grant. Included are a review 
of the project’s objectives and surveys of the program manipulation tools developed, the 
environmental software supporting Delta, and the compiler research projects in which 
Delta has played a role. An appendix describes the Delta system in detail. 


1 Objectives 

Fortran 77 programs are portable to many computer architectures. But the program 
characteristics that yield the best performance vary from machine to machine. The com- 
mon goal of researchers in automatic restructuring is to capture and preserve the meaning 
of a program while varying the program structures that most influence its speed and ef- 
ficiency on different computer systems. Although a number of commercial and research 
program restructures have been written, the cost of exploring new techniques or optimiza- 
tion strategies is still extremely high. 

The Delta Program Manipulation System[Pad89] is an open system of tools and compo- 
nents and a workbench environment for developing new compiler techniques in automatic 
program restructuring. Included are: a fortran parser; an extensive repertoire of opera- 
tions and data structures common to vectorizing and parallelizing compilers; and the tools 
and methodology needed to generate and test new compilation methods and strategies. We 
believe that this approach can reduce the cost of research and development for advanced 
compilers in the same way that domain-specific languages (e.g. Mathematica) have reduced 
the cost of problem solving in other technical fields. 

Openness An ‘open system’ is one which exposes its component parts for modification, 
replacement, or reuse in new contexts. Several factors contribute to the openness achieved 
in Delta. 

This work was supported by the National Aeronautics and Space Administration and the Defense Ad- 
vanced Research Projects Administration under Grant No. NASA NCC 2-559. Part of this work was carried 
out by James R.B. Davies. 
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• The implementation language (setl) is Very high level. This means the amount of 
text invested in any one design commitment tends to be small, and thus manageable. 

• An ‘applicative’ programming methodology has been followed in which components 
are relatively insensitive to the context in which they are used. 

• The central data structures in Delta are labelled maps. Because they are self-docu- 
menting and flexible, they are easy to use in new ways, or modify for new uses. 

• Environmental software has made the SETL source of Delta ‘content addressable 1 . 

In the following sections we will describe parallelization and illustrate Delta’s program 
manipulation tools, then survey the environmental software supporting Delta and the com- 
piler technology research at CSRD in which Delta is beginning to play a part. Appendix A 
gives a detailed description of the Delta system. 

2 Programs and Manipulations 

Delta operates on FORTRAN 77 programs. To make them tractable, they are represented 
internally as abstract syntax trees that supress the lexical and syntactic quirks of fortran. 
In SETL, data objects can share storage under a discipline of ‘copy-on-write’. Each fortran 
program appears to be a separate setl data object. Delta transformations take fortran 
programs as ‘call-by-value’ arguments and deliver revised programs as results. Memory 
requirements do not multiply since only the few substructures which change need new 
storage. 

Internally, Delta breaks a program into its: 

• Imperative statements 

• Symbol table * Initial data values 

• Applicative expressions 

• I/O format specifiers • Common storage layout 

• Storage equivalences 

Each substructure collects and indexes one class of program components. The component 
descriptions are collections of named attributes. Some attributes link components together 
(by their names or indices) into semantic networks. Delta works by discovering and deriving 
facts about the program’s behavior when it is executed. Facts are added to the tree both 
as new top level structure and as annotations to low level components. 

An executing fortran program produces a sequence of stores into memory cells, refer- 
ences to stored values, and calculations creating new values. The program’s text may refer 
to one storage cell in many different ways. The cells of an array are identified by subscripts 
which are integer arithmetic formulae. Symbolic algebra and Diophantine analysis can be 
used to test whether two subscript formulae ever intersect. Where they do, the two uses of 
that array may involve the same storage cell. Such a pair, where at least one storage action 
is a write, forms a data dependence and requires that the two memory references occur 
in their original order. 

The sequencing of storage actions is captured in data dependence graphs, a control flow 
graph, and a subroutine call graph. The graphs summarize how the parts of the program 
cooperate to achieve its net result. These graphs are examined before most program changes 

2 



to verify that the transformed code will be equivalent to the original. As the program is 
changed, these graphs are updated or regenerated to reflect the current organization of the 
program. The incremental cost to do this is small because optimizing transformations tend 
to preserve most storage relationships. 

Parallelism can be recognized in a sequential program as a pattern of data and control 
independence. Parallelizing is the process of producing these patterns by modifying loop 
structure, introducing auxiliary storage cells, and reorganizing calculations to avoid small 
cycles of dependence which can only be supported by serial loops. 

Today the Delta system includes sufficient preconditioning, analysis, and transformation 
components to parallelize and restructure many example programs. It can permute the 
nesting order of a collection of loops, distribute loops into vector form, or split them into 
parallel and serial pieces. It can normalize them, stripmine them, or reverse their iteration 
spaces. It recognizes scalar inductions carried by a single loop, scalar variables local to a 
single loop, summations, and doalls. 

In the next half year we will extend Delta’s parallelization techniques by collecting 
dependence cycle breakers: particular transforms, triggered by the appearance of a circular 
path in the data or control dependence graph, and designed to break the cycle. Some of these 
cycles are easily broken by recognizing which variables are loop invariant, linearly varying, 
or localizable. Such properties of a loop are discovered by the preconditioning passes already 
built for Delta and appear as annotations to the internal program representation for later 
passes to use. 

3 Environmental Support 
3*1 SETL 

For now, any serious user of Delta must become a SETL programmer. Fortunately most 
programmers can intuit the basic principles of SETL by imagining a cross between Algol 
control constructs, Set Theory notation, and Lisp recursive data structures. A key to the 
power of SETL is the flexibility of sets and tuples for representing data structure. It is 
especially important for Delta programmers to understand maps. A map is a set of ordered 
pairs (i.e. 2-tuples). Setl allows a map to be applied to an argument like a function; the 
result is the second element of the ordered pair whose first element matches the argument. 
For example, if we create a map from the first four integers to their names: 

> number. to_n am e :* L [1 , "one"] , [2, "two"], [3, "three"] , [4 , "f our M ] > ; 
then we can use this variable like a function: 

> number_to_name(l) ; 

"one 11 ; 

If the argument is not in the domain of the map (i.e. the set of first elements of the ordered 
pairs), the mapping operation returns 4 0M\ If more than one ordered pair has the same 
first element, then the map is referred to as multi-valued. A special form of the mapping 
operation, using curly braces instead of parentheses, will return the set of all second elements 
of ordered pairs in the map whose first element matches the argument: 
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> number_to_name := number, to .name Union {[l/'uno"]}; 

> number . to. name{l} ; 

{"uno" , "one"} ; 

> number. to_name{2} ; 

{"two"} ; 

A mapping operation with parentheses is illegal for members of the domain with multiple 
values. The test for this error occurs at runtime. The choice of which algorithm to use to 
perform the mapping is also made at runtime. Very little of SETl/s syntax is devoted to 
specifying implementation details. Runtime choices are expensive. They are avoidable in a 
commercial restructure^ but are welcome in Delta because they reduce the amount of text 
that must be changed to revise a design choice. 

3.2 Interactive Delta 

A typical Delta development session might start out as follows. First the interpreter is 
started and it reads all of the Delta source code: 

shellV, idelta 

DELTA Program Manipulator Last update: Feb 14 16:09 
(c) 1991, Board of Trustees, Univ. of Illinois (CSRD) 

ISETL 2.0 Last updated on 89/12/12 at 13:18:09. 

(c) Copyright 1987,1988,1989 Gary Levin 
Enter Iquit to exit. 

Current GC memory = 50080, New Limit « 4000000 
Current GC memory = 3996384, Limit = 4000000 

> matmul := read.programO'matmul .f n ) ; 

> display.program(matmul) ; 

SUBROUTINE MATMUL (A ,B ,C ,N) 

52 DO I = 1, N, 1 

53 DO J = 1, N, 1 

54 X = 0.0 

55 DO K = 1, N, 1 

57 X ~ X+B(I ,K) *C(K, J) 

58 ENDDO 

510 A(I , J) = X 

511 ENDDO 

512 ENDDO 

513 RETURN 
END 

0M; 

Function read_program invokes the Delta scanner (a separate program, written in C) on its 
filename argument. The scanner produces a SETL data structure that completely describes 
the program. Read.program loads this structure as a variable within the isETL session, 
annotates it with its control flow graph and variable cross reference, and returns the whole 
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package as its functional result. Here we assigned it to the variable ‘matmul’. A call to 
display -program lists out the executable statements of ‘matmul’ in FORTRAN form. 

The ‘>’ is an iSETL prompt; a statement typed here will execute and have its value 
printed. The value of display -program was undefined, which setl treats as a constant 
called ‘OM’ (for Omega or omitted). We can begin to examine matmul as a setl map by 
asking: 

> domain (matmul) ; 

{‘’statements'* , "initial. statement” , "final. statement” , 

"expression” , "loop. inf o" , "routine. type” , "symtab"} ; 

In more complex programs we might also see substructures for "common-blocks" , "equi- 
valences", and other fortran features. 

Compiler authors rely on utility functions to abbreviate most data accesses. For ex- 
ample, one query function in Delta is called stmts_of_type. It returns a tuple of names of 
statements of a given type, in the order that they appear in the program. If lexical order is 
not important, the raw setl needed to acquire the same subset is almost as brief: 

> stmts. of .type (matmul , "DO") ; 

["S2" , "S3", "S5"]; 

> all. stmts := matmul ( "statements") ; 

> { stmt : attr = all.stmts (stmt) | attr("st") = "DO" }; 

{"S2", "S5" , "S3"}; 

At the top level of compiler construction, transformation and analysis functions are 
more common. Here we have composed many Delta steps into a ‘precondition’ function, 
which returns a heavily annotated version of its argument. We then apply an experimental 
vectorizer to the annotated program and put the parallelized result into a separate iSETL 
variable called ‘matmuLvector’. 

> matmul. vector := tiny.vectorizer (precondition (matmul)); 

52 is a DOALL 

53 is a DOALL 

S5 is a summation 

We have chosen to express parallelism as an annotation to a fundamentally serial program. 
Preserving the sequential view of the program’s semantics means that sequential analyses 
are still applicable to the transformed program. 

> display. program (matmul.vector) ; 

SUBROUTINE MATMUL ( A, B, C ,N) 

52 DO I ■ 1, N, i {DOALL} 

53 DO J = 1, N,T {DOALL} 

54 X = 0.0 

55 DO K = 1, N, 1 {SUM} 

57 X = X+B(I,K)*C(K,J) 

58 ENDDO 

510 A (I , J) = X 

511 ENDDO 
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512 ENDDO 

513 RETURN 
END 

OM; 

> print_graph(matmul_vector, "S5", OM) ; 

Output Dependence with Direction [=, =, <] from S7[26] : X to S7[26]: X 

Flow Dependence with Direction [=, =, <] from S7[26] : X to S7[27]: X 

Antidependence with Direction [=, <] from S7[27]: X to S7[26] : X 

OM; 

On the other hand, most restructuring transformations also modify the serial program. 
Changes to the lexical and control flow graphs are evident in the program display. 

> display_program (stripmine (matmul_vector, "S3", 32)); 

S3 stripmined into [S3, ST2] 

SUBROUTINE MATMUL(A,B,C,N) 

52 DO I - 1, N, 1 -CDOALL} 

53 DO J1 = 1, N, 32 -CDOALL} 

ST2 DO J = Jl, MIN(N,31+J1) , 1 {DOALL} 

54 X - 0.0 

55 DO K - 1, N, 1 {SUM} 

57 X = X+B(I,K)*C(K, J) 

58 ENDDO 

510 A(I , J) = X 

ST3 ENDDO 

511 ENDDO 

512 ENDDO 

513 RETURN 
END 

OM; 

An interactive Delta session allows a compiler developer to explore potential algorithms 
by ! include’ing experimental software and watching its effects on actual programs. The 
growing repertoire of transformers, instrumenters, program analyses and displays make the 
interactive system a versatile laboratory for developing new compiler technology. 

Using iSETL within Delta 

Delta uses maps (and occasionally multi-valued maps) where other compilers would use 
structured data. For instance, each fortran compilation unit is a map. When we ask for 
matmulC "statements"), the result is a map from statement names onto other maps that 
describe the attributes of each statement. For example the right hand side of the assignment 
in statement S7 is: 

> format_expr(matmul, matmul("statements") ("S7") ("rhs") , true); 

"X+B(I ,K) *C(K, J) " ; 

Another statement attribute is its FORTRAN type (spelled "st"). To explore the concept 
of SETL maps, let’s quickly build, and then display, a map from statement names directly 
to their type: 
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> stmt _ types :» {[s, attr("st")] : [3,attr] in matmulC "statement s")}; 

> stmt .types ("SI") ; 

"DO"; 

Here s and attr are iteration variables; they become bound to the elements of each ordered 
pair in the statement map of matmul. Attr is the map of the attributes of statement 
s; attr (" st") is the one we we need. As s and attr iterate over all the statements, 
{[sj , attr 3 ("st")] » . . .} becomes a set of ordered pairs: a new mapping. 

Setl can iterate over maps, sets, or tuples to form other maps, sets, or tuples, or to do 
more traditional forms of data processing. In the following (truncated) example, the l >>’ 
prompt indicates that iSETL is waiting for more text in a syntactically incomplete construct 
(in this case a conventional ‘for’ loop). 

> for t=stmt_types (s) do 

» writeln s," is ", if t(l) in "AEIOU" then "an "else "a "end if, t; 

>> end for; 

512 is an ENDDO 

5 13 is a RETURN 

S4 is an ASSIGNMENT 
S2 is a DO 
SI is an ENTRY 


A SETL Extension: ©fieldname 

Since we make such extensive use of strings in the domain of maps, we have extended iSETL 
to streamline the syntax for mapping strings. The syntax resembles function composition: 
©xxxGyyy [z] finds the location "xxx" in the map called "yyy n in the map stored in variable 
z. This notation may be used for both storage to and retrieval from a hierarchical dataset. 

> @st©S4@statements [matmul] ; 

"ASSIGNMENT"; 

> stmt.types : = {[s, Ostfattr]] : [s,attr] in ©statements [matmul] } ; 

> ©S2 [stmt.types] ; 

"DO"; 

3*3 Batch Processed Delta 

iSETL is not the fastest implementation of SETL. Production runs of Delta on large programs 
are not practical using the public domain interpreter that makes development so easy. To 
overcome this problem, We acquired the SETL2 compiler from Courant Institute[Sny90] for 
our workstations. In many cases, the compiled form of Delta has proven to be between 4 
and 20 times faster than interactive iSETL interpretation. This enables us to process the 
entire Perfect Benchmark suite through a Delta experiment as an overnight batch job. 
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iSETL — ► SETL2 conversion 

The setl2 compiler and iSETL interpreter differ in many interesting respects. We have 
avoided substantial parts of both languages in an attempt to keep a single source code for 
Delta in a fairly standard core dialect. The remaining differences are bridged by writing in 
a subset of iSETL and converting iSETL to setl2 before compiling. The custom conversion 
program written for this purpose does an excellent job of preserving indentation, vertical 
alignment, comments and other ergonomic aspects of the code. This has left open the option 
of converting to SETL2 as the primary development language. 

main := func(args); 

This is the main program for Delta. ‘Args’ is a tuple containing the parameters passed on 
the command line. 

The iSETL interpreter is the ‘main’ program for an interactive session, but compiled 
code needs a preprogrammed sequence of commands. So far each experimenter has written 
a custom version of ‘main’ to carry out the desired tests. Some speculation has gone into 
the design of an interactive front-end for compiled Delta, either an interface to a source 
browser cum editor, or a setl2 interpreter. While this issue is ultimately important in 
building the Delta user base, it has so far taken a back seat to the construction of basic 
transformation tools. 

3.4 Version Control 

The Delta program source resides in a production directory with a full audit trail, and 
represents a useable release of the Delta system. This serves as a backbone for independent 
development directories kept by several project programmers. The contents of a develop- 
ment directory are overlayed on the current production directory to produce experimental 
releases. In support of this, two scripts have been written (idelta and cdelta) to produce 
the composition of a developer’s private directory with the public production directory and 
invoke either the iSETL interpreter or the SETL2 compiler on the result. 

In addition special checkout and checkin scripts allow developers to move files between 
their directories and the production directory. These scripts check for potential conflicts 
between a known group of developers. They also maintain the audit trail and backup copies 
of recent work. 

3.5 Cross Reference 

A system is only “open” insofar as its components are easy to locate, understand and 
reuse. To enhance this quality of the Delta project, several tools have been added to the 
setl programming environment. Together they provide an interactive cross-reference to 
the Delta project source files, fully integrated with the Gnu Emacs editor. The components 
of this system are: 

A SETL editing mode teaches Emacs enough about the lexical structure of setl to 
make its cursor motion, editing, and search commands recognize token boundaries. 
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A Tagsfile generator builds a catalogue of source locations where setl identifiers are 
given new definitions. The tagsfile locates all statically recognizable definition sites using 
the various forms of declaration and side-effects in the isETL dialect used in the Delta source 
code. A single tagsfile covers the whole Delta source. The file adheres to an Emacs format 
previously used for Fortran, C, yacc, and Eye. 

The Gnu Emacs Tags functions have been enhanced to support structured code walk- 
through. While editing any part of the Delta source it is now easy to visit all definitions 
or uses of any function, variable, or field name. A stack of return locations is kept so 
that when the identifier’s meaning has been sufficiently explored (or modified) the editing 
session can return to the spot from which it first departed. The return stack includes 
the remaining itinerary of any searches currently in progress. By directly modelling the 
call/return discipline this package supports code walk-through and makes it easy to validate 
changes to the code. 

The Call Graph: One feature of an “open” system is the ability to replace lower level 

components to change the detailed behavior of higher level actions. Of course, the replace- 
ment must fill the needs of all its higher level callers. A call graph is a summary of 
component interrelationships that can be used to locate call sites of a given component. 
The tagsfile generator can also produce the call graph for a collection of isETL modules. 
Module relationships can also be abstracted. Observing and quantifying the cross-module 
references has led to better choices about module boundaries in Delta. 

4 Research Involvement 

Subscript Classification (Paul Petersen) 

One of the ways to improve data dependence information is to expand the applicability of 
the dependence tests to a larger percentage of the potential dependences. Classifying the 
sources of the unknown dependences is useful in determining where further effort may prove 
beneficial. 

One experiment examined all of the unknown dependence arcs in a benchmark suite and 
categorizing them based on the type of coefficients of the loop indices. In each category the 
precedence was {Array, Variant, Invariant, Numeric}. If two or more different classification 
types were present in the same part of the subscript pair, the one with the higher precedence 
was chosen. Each group of coefficients was subdivided into four categories based on the types 
of their constant terms. By constant term we here mean any additive term not containing 
an index variable of some enclosing loop. The following functions were added to the Delta 
system: 

classify ^args_tree := func( pgm, ex, invar ); 

Return a string of 1 character labels which classify a subscript expression: 

A=array P=subroutine paramater 

C=common variable V=generic variable 
F=function X=unknown construct 

I = invariant 0 =zero 

N=numeric 1 =unity 
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sort_classify-set := func( res ); 

Return a sorted character string for the elements of set ‘res’ 

classify ^subscript func( pgm, si, expl, s2, exp2, invar ); 
classify _subscript_pair := func( pgm, si, expl, s2, exp2, invar, ndirs ); 

Join the classification sets of each subscript and create a string tag to describe the 
dependence pair; record this. 

partially -linear func(pgm, si, expl, s2, exp2); 

Determine if the dependence pair ‘expl’, ; exp2’ are partially linear with all coefficients 
of index variables constant. 


Coefficient Type 


Constant 

Numeric 

Invariant 

Variant 

Array 

Numeric 

15722 

1079 

6 

499 

Invariant 

2908 

4940 

— 

— 

Variant 

29492 

3083 

73 

36 

Array 

25240 

78 

— 

425 


The above table summarizes this analysis. In each category we collected a weighted 
count of how many distinct dependence arcs could not be analyzed due to the lack of com- 
piletime information characteristic of the category. The weighting factor equals the number 
of feasible directions of the potential dependence. The table ranks the most important 
sources of unknown dependence as: 

1. variables that may be modified unpredictably during a loop, 

2. subscripted subscripts, 

3. loop invariants whose relation to the other terms in the subscript equations is not 
known. 

Compilation techniques such as interprocedural analysis, and advanced induction vari- 
able recognition can help to reduce the first category. The problem with subscripted sub- 
scripts is more challenging and is usually resolved by the user asserting that the subscripting 
array is a permutation. Reducing the third category involves more complex analysis and 
propagation of known relationships between invariant variables. 

Efficacy of Dependence Tests (Paul Petersen) 

Despite the popularity of approximate data dependence tests, there has been little empirical 
analysis of their effectiveness. One research project[PP90] based on Delta analyzed some 
approximate tests including the GCD method and three variants of Banerjee's test. 

• To evaluate the accuracy of these test, their outcomes were compared with an exact 
integer programming method. 

• To evaluate their effectiveness, the Perfect Benchmark suite programs were pro- 
cessed, one subroutine at a time, through a Delta-based testbed system. 
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Two experiments were run using different sequences of dependence tests. Each potentially 
conflicting subscript pair was classified as described in ‘Subscript Classification’ above. The 
dependence test sequence applies only to subscript equations whose coefficients and constant 
terms are known at compiletime. A counter associated with each dependence test was 
incremented whenever that test was the first to detect independence. For Banerjee’s tests 
and the integer programming test the increment was 1, The Constant, GCD, and integer 
programming test counters were incremented by the number of distinct direction vectors 
of the potential dependence since their results cover all dependences between the subscript 
pair. Counting in this way, the weight of each potential dependence grows exponentially 
with its level of nesting within DO loops. 

Each of the two experiments consisted of two parts. The first part used the loop limits as 
they appear in the source program. In this case, the Banerjee Rectangular and Trapezoidal 
tests and the integer programming test do not apply to all loops since they require that 
the limits be known at compiletime. For the second part we assumed an arbitrary constant 
lower and upper loop limits and unit stride. 

Did ‘knowing’ the loop bounds help much? The Banerjee Rectangular Test became more 
effective by 8.64%, but at the same time the Banerjee Infinity Test is reduced by 8.37% for 
a net gain of 0.27%. The reordering of the dependence tests between runs also illustrated 
that only 0.53% of the analyzable dependences needed to know the upper bounds of loops 
to resolve the equations. Bounds information may play a larger role with more advanced 
induction recognition, but does not by itself improve dependence testing. 

Exact Integer Programming proved only 0.25% more accurate than the approximate 
tests across the whole benchmark. These results point to improving the quality of the 
information available at a potential dependence site as the most significant research goal in 
parallelizing FORTRAN. 

Synchronization (Sam Midkiff) 

The Delta system is being used as the implementation tool for two experiments concern- 
ing synchronization in shared memory multiprocessors. The first (partly implemented) 
experiment compares the effectiveness of several code generation techniques for optimiz- 
ing synchronization instructions[Jay88, Li85, MP87]. Each of the optimization methods 
is being implemented in Delta. These ‘synchronization minimizers’ postprocess the result 
of a simple doacross pass[Cyt86]. Doacross loops are partially parallel: they satisfy 
dependences between different iterations by synchronizing the parallel processors so that 
conflicting memory uses occur in their original serial order. Code has been prepared to in- 
sert post/wait, and Alliant FX advance/await[A1185] synchronization instructions into 
concurrent loops. Statistics will be collected on how much redundant synchronization is left 
after using each optimization method. 

The second experiment is still in the planning stage. Its goal is to compare several 
synchronization technologies: 

• Post and Wait, 

• Advance and Await, and 

• Process-based synchronization^ Y88]. 
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Using Delta, fortran programs will be analyzed to determine how many bits of synchro- 
nization data are necessary to synchronize the programs using each of the methods, A 
compiler generated timer will be used to simulate the potential speedup of each synchro- 
nization method. This will let us study the tradeoffs between parallel speedup and the 
complexity of synchronization hardware. 

Critical Path Length (Paul Petersen) 

Using the same powerful notion of instrumenting a program in order to simulate an alter- 
native model of its execution, an experiment is nearing completion to measure a program’s 
execution time under ideal conditions. Like the MAXPAR simulation package[Che89], code 
is being added to serial programs so that when executed, an assessment can be made of 
their potential parallel execution times. So far the current work within Delta involved re- 
producing some of the maxpar results as cross-validation exercise. It is notable that the 
instrumentation approach made possible by Delta significantly lowers the time needed to 
acquire simulation data for a program. 

The overall goal of the experiment is to characterize the performance of a particular 
run of a given program if only it could be compiled with perfect knowledge of the control 
and data dependences that arise during the run, and perfectly scheduled for execution 
by a parallel processor with no resource constraints. The metric we are most interested in 
initially is the operation count along the critical path of actual data and control dependences 
encountered in the run. By studying the wide gap between theoretically ideal and practically 
achievable compilation of actual application codes, this experiment can help set priorities 
and expose unforseen opportunities for optimization efforts. 

Work in Progress 

Several lines of development are now under consideration or already underway: 

Data structures are being designed to represent the parallelized program, without losing 
the original sequential semantics needed for most analyses of the program. 

The loop transformations on perfect loop nests should probably be extended and unified 
into a single step transformer along the lines suggested by [Ban90] and [WL90]. 

Modules will soon be needed to estimate the resources consumed by a transformed program 
if it were to execute on a given machine architecture. Specifically memory references 
should be counted and classified into different access patterns (e.g., vector write to 
private memory, or synchronized reduction to shared memory). 

Demonstrations 

The Delta Program Manipulation system was demonstrated outside CSRD 

• at the Fall ’90 DARPA contractors meeting in Chapel Kill, NC. 

• the Supercomputing ’90 Conference in New York City. 
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The audiences for these demonstrations were gently introduced to the whole topic of auto- 
matic program restructuring. On display were both the Delta project source code as seen 

through the interactive cross reference, and the interactive “try-it-and-see” interface. Loop 

Distribution, Interchange, Concurrentizing, Vectorizing and Stripmining were illustrated. 
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A The Delta System 

The following is a collection of brief descriptions for the majority of the functions currently 
available in Delta. Every function which can be invoked from the top level of Delta has 
been included; this has tended to mix high level and low level functions. To relieve this, the 
functions are classified by topic and some discussion has been added outside the framework 
of the function-by-function documentation. 

A.l Fortran Programs as Data 
The Scanner 

The conversion fortran 77 — ► Delta internal form is carried out by a separate program 
called the ‘scanner’. For each compilation unit in the FORTRAN source file, the scanner 
produces one SETL map. To Delta, this map is the program. Its hierarchical structure 
captures every significant semantic detail in the FORTRAN 77 source code; allowing it to be 
accessed based upon its meaning, rather than its lexical structure. 

runjscanner := func( name, output ); 

Run the fortran 77 to delta conversion program. Supply the input and output file 
names to be used. 

read-program := func(file_name); 

Given a file name, return the Delta form of the first compilation unit found there. 

For the remainder of this report and throughout the Delta source code, the variable name 
‘pgm’ always refers to a program object of the type produced by ‘read-program’. 
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The ©expression Table 

Delta represents fortran expressions as labelled trees. The scanner produces a separate 
table (©expressions) containing the parse tree of each expression in the program. The 
trees are formed with explicit links (indices into ©expressions). Each tree node is a map 
whose domain is a self-explanatory collection of field names, for example: 

{["op", ,, + M ], [ ,, args , \ [123,124]], ["type", "INTEGER"] }. 

In this example, the left argument to £ +’ is found at ©express ion [pgm] (123). 

All expression nodes (except V operators) have a ©type field for the data type of the 
operator’s result. Currently the data frame size is kept only in the symbol table and not 
propagated throughout the expression tree. All expression nodes have an ©op field. They 
may also have fields called ©name, ©label, ©value, or ©args. The following table de- 
scribes how to interpret the ©op field. 


©op * 


INTEGER-CONSTANT 

REAL-CONSTANT 

STRING-CONSTANT 

LOGICAL-CONSTANT 

HOLLERITH-CONSTANT 

COMPLEX 

ARRAY _REF 

SUBSTRING 

FUNCTION-CALL 

INTRINSIC-CALL 

RETURN* 


©value = integer 
©value = ‘real number’ 

©value = ‘characters’ 

©value = 4 . TRUE . ’ or ‘ . FALSE . ’ 

©value = ‘H length characters’ 

©args — [real, imaginary] 

©args = [array ID, subscript list] 

©args = [base variable reference, substring bounds] 

©args = [function ID, parameter list] 

©args = [intrinsic ID, parameter list] 

©args = [label] (in CALL parameter lists) 


omitted 


(in ENTRY parameter lists) 


LABEL 

IO* 

ID 

annotations: 


U+ U- NOT 
EQ NE LT LE GT GE 
-/**// 

+ * 

OR AND EQV NEQV 
DO 


©label = ‘statement name’ 

(implied I/O unit or Format) 

©name = ‘identifier’ 

©substituted = equivalent expression 
©possible-values = {integers} 

©value = integer 
©args = [right] 

©args = [left, right] 

©args = [left, right] 

©args = [argi, arg 2 , ... , arg n ] 

©args = [arg 1? arg 2 , ... , arg n ] 

©args = [iolist, iterator] (for I/O implied DO loops only) 

©args = [index ID, iteration space ](for implied DO iterators only) 
©args = [argi, ar §2? -- , arg n ] (for parameter lists, subscript 


lists, io lists, implied DO iter- 


ation spaces, . . . ) 


Explicit links from an expression to its subexpressions (Qargs) are very useful when a 
modification must affect a subexpression without worrying about where it actually occurs 
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in the program (or how many occurrences may be sharing the subexpression structure). 

repLexpr := func(pgm, e, old_value, new -value); 

Return a new copy of ‘pgm’ in which all occurrences in expression V of the subex- 
pression ‘olcLvalue’ are (destructively) replaced by unique copies of subexpression 
‘new -value’. The substitution is not recursive. It affects ©substituted expressions, 
but does not consider these when trying to match occurences of ‘old_value’. It modifies 
expression ‘e’, without garbage collecting the subexpressions ‘old_value’, or copying ‘e’. 

replace_variable_uses := func(pgm,stmt,prototype_map); 

find _and -replace := func(pgm,e,prototypejmap); 

Return a new version of ‘pgm’ in which every ID node that is in the domain of ‘proto- 
type-map’, and in expression ‘e’ (or in any expression of ‘stmt’) is replaced by a copy 
of its image in ‘prototype_map\ Identity maps are allowed. 

This modification assumes there is no structure sharing and does not introduce any. IDs 
in ©substituted expressions are not affected. The ©substituted, ©possible-value, 
and other annotation fields of replaced nodes are preserved. 

copy -expression := func(pgm,e); 

Duplicate expression ‘e’; return [new pgm, duplicate’s index in ©expression], 

make_expr_node := func(pgm, node.contents); 

Return a new copy of ‘pgm’ and the index of a new expression therein which has the 
given ‘node-contents’. 

But, explicit links and two-part results are a nuisance when the expression must be reor- 
ganized, simplified, or repeatedly copied. For these actions, Delta expression transformers 
first ‘implode’ an expression, which makes a copy independent of the ©expression table. 
The imploded form, called an args-tree, replaces the pointers of the ©args field with the 
actual nodes to which they pointed. Args.trees are easier to manipulate because SETL au- 
tomatically garbage collects unused nodes, and copies modified ones, and because they can 
be modified independent of any particular expression table. A reverse process (‘exploding’) 
embeds an entirely new copy of the expression tree within a given ©expression table. 

args_treeJmplode-expression := func(pgm, e); 

Extract an expression from the ©expression table of ‘pgm’. Return a nested form of 
the expression that is independent of the expression table. 

explode_args_tree := func(pgm, ex); 

Given a tree form expression, return a new copy of ‘pgm’ with the tree’s components 
inserted into the expression table. Return [pgm, the index of the root table entry]. 

form_args_tree := func( op, Type, left, right ); 

Create a new ‘tree’ node. 


SUB func( left, right ); 
ADD := func( left, right ); 
MUL func( left, right ); 
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DIV := func( left, right ); 

Create a new arithmetic node in args_tree form. 

COMMA := func( args ); 

Create a new * , ’ node for parameter lists, etc. 

ARRAY := func( name, args ); 

Create a new ARRAY _REF node. 

list-subscripts := func(pgm, ref); 

Return a tuple of the subscript expression indices from an ARRAY _REF. 

CST := func( val ); 

NEG := func( ex ); 

Create a constant node whose type is determined by the type of the argument. 

result-type := func(op, left-type, right-type); 

Find the resulting type from the given binary operation. 

IS_CST := func( ex ); 

IS-ZERO := func( ex ); 

Determine if the expression tree node is a constant value. 

COPYJTD := func( id-name, id-type ); 

MAKE_ID := func( pgm, id_name ); 

Create an args-tree ID node. 

IF IX := func( child ); 

DYADIC JFN func( fn, left, right ); 

ASSOC-FN := func( fn, left, right ); 

Create a new intrinsic_call tree node. 

maximize-args-tuple := func( tuple ); 

Given a tuple of (non-negative) trees return the tree representation of the MAX over 
the list. 

side.effect-free := func(pgm,e); 
side_effect_free-args_tree := func(ex); 

Return true if the given expression has no side-effects (or is OM). 

invariant^args.tree := func(ex, invar); 

Given a side_effect_free-args_tree 4 ex\ return true if all its free variables are in 4 invar\ 

equal-expressions := func(pgm, p, q); 
equal_args_tree := func( p, q ); 

Are the two given expressions/trees structurally equivalent? 
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Algebraic simplification 
simplify jargs.tree := func( ex ); 

Return an arithmetically equivalent expression tree created by applying the following 
simplification transformations in the correct order. 

coeflargsjtree func( ex, id ); 

Return a numeric coefficient of the variable fid’ in expression ‘ex’. If the coefficient is 
not numeric then return OM. 

all_variable_factors := func( ex ); 

Returns the set of variables that are contained in the formula of expression tree ‘ex’. 

contains_set^args_tree := func( ex, varJist ); 

Returns the subset of variables in ‘varJist’ that are contained in the top-level formula 
of expression tree ‘ex’. 

ext ract_args .tree func( ex, varJist ); 

Given ‘ex’ (a term or sum of terms) and a list of variable names, decompose ‘ex’ into 
a mapping from the names onto either 0, or a term or sum of terms that mention the 
variable. 

variable_factors := func( ex ); 

Returns the set of variables found as factors of ‘ex’. 

member_ J args_tree := func( ex, v ); 

Given an tree ‘ex’ which is either an identifier ‘v’ or a multiple thereof, return a subtree 
of ‘ex’ equal to V, else OM. 

take_from_args_tree := func( ex, id ); 

Return the coefficients of the identifier ‘id’ from the multiplicative expression ‘ex’. 

combine_args_tree := func( ex ); 

Combine the coefficients of common terms and simplify. 

dist_times-args_tree := func( args ); 
distribute_args_tree := func( ex ); 

Distribute over 

eval_func_args_tree := func( ex ); 

Return a simplified form of the INTRINSIO.CALL ‘ex’. 

flatten_args_tree := func( ex ); 

Flatten the tree structure around associative operators. 

fold_^irgs_tree := func( ex ); 

Combine constant arguments at every node. Use the ©substituted annotations to 
replace ID nodes. Replace subtraction (X-Y) by the addition of the negation (X+-1+Y). 
Perform simple symbolic simplifications such as (1*X) —♦> X, and (0*X) — > 0. 
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negate_args_tree := func( ex ); 

Given a tree that has recently been folded, multiply it by -1, in a way which leaves it 
as fully folded. 

The ©statement table 

ASSIGNMENT 

©lhs 1 , ©rhs 1 
DO ©follow 2 

©index 1 

©init.expr 1 , ©limit-expr 1 , Qstep-expr 1 
IF, ELSEIF ©expr 1 

©follow 2 
ENDDO, ENDTHEN 

©follow 2 
GOTO ©target 2 

ARITHMETIC JF, COMPUTED^GOTO, ASSIGN ED _GOTO 

©expr 1 

©label J.ist 3 = [targeti 2 , . ..] 

ASSIGN ©lhs 1 

©target 2 

READ, WRITE, PRINT, OPEN, CLOSE, 

REWIND, BACKSPACE, ENDFILE, INQUIRE 

©s_control = {[‘KEYWORD’, expr 1 ], ...} 

©io JList 1 
STOP, PAUSE 

©expr 1 3 
CALL, ENTRY 

©routine = ‘identifier’ 

©parameters 3 
RETURN ©expr 1 3 

label ©label = integer 

ENDIF 

Table 1: Fields in the Abstract Syntax Tree for each Statement type 

The executable statements of a program are collected in the ©statements map. To 
describe the many characteristics of the different FORTRAN statement types, each statement 
is represented by a labelled map. The following subfields appear in every statement: 

©st = ‘statement type’ 

Qnext = ‘name of the lexically next statement’ 

Table 1 lists all the FORTRAN 77 statement types (©st) and the subfields used to capture 
their syntax and sematics. 


1 = index into expression table. 

2 = ‘statement name 5 . 

3 omitted in some cases. 
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Cross Reference, Flow Graph, Loop Nesting 

One field provided by the scanner for some statement types helps Delta deduce the flow of 
control for compound constructs: 

(Of ollow = name of the ‘other statement’ in a compound construct 

For instance, do/enddo statement pairs point to each other via the Of ollow field. For 
if, elseif and else statements, ©follow points to the matching else, elseif, or endif. 
Each IF, else and elseif clause is ended by an ENDTHEN statement, whose Of ollow points 
to the endif that terminates the whole mess. All one-line logical ifs are converted to 
IF /THEN /stmt /endthen /endif sequences by the scanner. FORTRAN statement labels are 
attached to separate statements, of type ‘label’. They are just placeholders and branch 
targets. 

All other fields are derived information, added by Delta during its program setup phase: 
setup := func(pgm); 

Return a copy of ‘pgm’ annotated with the derived fields. 

Oin_ref s = {indices of memory read expressions} 

Oout_ref s = {indices of memory write expressions } 

Each ©expression index in these sets is either an ID, an ARRAY.REF, or a SUBSTRING. 
©in_refs and ©out_ref s do not account for the ‘hidden’ side-effects of function and sub- 
routine calls. These annotations are handled by the routines: 

build _in_out_refs := func(pgm,s); 
list_refs ;= func(pgm,s); 

Add (list) ©injref s and ©out_ref s expression sets to statement ‘s’. 
add_refs_to_program := func(pgm); 

Return a copy of ‘pgm’ where all statements have valid ©in_ref s and ©out_ref s fields, 
©prev = ‘name of the lexically previous statement’ 

Of course, the program counter does not always flow into a statement from its ©prev 
statement, but so far every programmer who has written a low-level transformation has 
erroneously assumed this at least once. These bugs are fixed, but this misconcept about 
the ©prev of a statement is subtle and insidious. 

©outer = ‘statement name’ of the innermost enclosing DO loop. 

©outer is omitted for statements outside of loops. The ©outer field of a DO statement 
names the next outer loop, not itself. The ©outer field of an ENDDO names the matching 
DO. 

©successors = {‘statement name’s to which control may flow} 

©predecessors = {‘statement name’s from which control may arrive} 

For example, most do statements have two elements in ©successors: the first statement 
in the body of the loop and the statement following the matching ENDDO. 
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flow .successors := func(pgm, stmt); 

Return the set of statement names of ‘pgm’ to which control may flow from ‘stmt’, 
flow.predecessors := func(pgm, stmt); 

Return the set of statement names of ‘pgm’ from which control may flow into ‘stmt’. 
This requires o(#statements[pgm] ) time to compute, so use the precomputed ©pre- 
decessors field wherever possible. 

A recurring issue in Delta is how much of the rich internal program representation must be 
rebuilt from scratch after each transformation. The flow graph is one example where coding 
complexity in the transformations must be traded off against the high cost of regenerating 
(via add .flow .graph ) all the flow linkages in a program. The routine update Jlow Jnfo is one 
answer to the trade off. It works on a bounded section of the program provided that any 
explicit (i.e. non-default) links crossing the boundary are already correct. These conditions 
can be met by most transformations without sacrificing clarity or code space. 

adcLflow .graph := func (pgm); 

Return a copy of ‘pgm’ where each statement has been annotated with: ©successors , 
©predecessors, and ©prev ignoring any existing values of those fields. The result 
also includes a Of inal .statement field that points to the lexically last statement. 

update.flow Jnfo := func (pgm, start-stmt, endjstmt); 

Return a copy of ‘pgm’ where the ©predecessors, ©successors, and ©outer fields 
have been updated over the given range of statements (connected by ©next). State- 
ments that are properly linked successors of statements in this range have their pre- 
decessor links updated to reflect changes within the range. The same is not true of 
properly linked predecessors, so beware of ranges that follow elses and ENDIFs. 

A. 2 Analysis and Transformation 
Data Dependence 

Each pair of conflicting storage references is represented by a directed arc in a data 
dependence graph that indicates which reference is executed first in the original se- 
quential program. The function dependence-graph () generates a complete graph for a 
nest of loops and saves the result in a program annotation (©loop_inf o). From this, 
dependence J.oop_info() can derive a graph for any inner loop of the nest. 

dependence.graph func(pgm, doloop); 

Build the dependence graph for the given loop. Return a set of tuples, whose elements 
are dependences. Each dependence has the following contents, in order: 

1. source statement 

2. sink statement 

3. source atom expression number 

4. sink atom expression number 

5. variable causing the dependence 

6. direction vector 
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7. dependence type ( £ f’=flow, ‘a^anti,’ ‘o’=output) 

derive_dependence := func( pgm, graph, s2 ); 

Given a dependence ‘graph’ for loop si return the dependence graph for loop ‘s2’ which 
is a subloop of si. 

dependence-loop -info := func( pgm, si ); 

Annotate the Oloop.info database with the dependence graph for loop ‘sl\ Compute 
the graph from an enclosing loop or compute it directly for the outermost loop. 

print-graph := func(pgm, 1, var_id); 

Print out the dependence graph for a loop nicely. Uses the dependence graph in 
Qloop_inf o. If varJd is given, only print dependences for that variable. 

dependences := func(pgm, si, s2, doloop); 

Build the dependences between the two given statements. 

intersect := func(pgm, si, s3_refs, s2, s2_refs, dirs, dep.type); 

Intersect the two given reference lists and return a dependence for each intersection 
and each element of the set of directions given. 

plausible-directions := func(pgm, si, s2, doloop); 

Return a set of plausible direction vectors for statements Si and S2. Each direction 
vector is a tuple with the first element being the outermost common loop, and each 
element being a string containing one or more of the characters ‘j’, or 

ignore_non_equal := func( graph, ignore ); 

Ignore a set of variables for which non-‘ = ’ directions should be excluded from the 
graph, e.g. localizable scalars. 

The computation-intensive part of data dependence analysis decides whether a sys- 
tem of equations (the equated subscripts of two array accesses) has an integer solution 
in a given region of Z n (the iteration space of the surrounding loops). Integer Program- 
ming techniques[SM89] can answer this question accurately. However, faster approximate 
techniques[Ban88] are thought to be more practical. An approximate test will sometimes 
predict a non-existant solution to the system of equations, but conservative use of the test 
results never leads to incorrect code, just to missed opportunities for optimization. 

same_test := func(pgm, si, el, s2, e2, dirs, dep-type); 
dd.tests ;= func(pgm, si, expl, s2, exp2, dirs, proto); 

This routine is called only if the lexical names of refl and ref2 are identical. It returns 
false if it can be shown that a dependence does not exist, or true otherwise. 

nonlinear-test := func( pgm, si, expl, s2, exp2, dirs ); 

This test counts the number of non-linear potential dependences. 

dd-tree_tests := funcfpgm, si, expl, s2, exp2, dir, proto); 

Expand the direction-vector tree until a true/false result is found. If the result at the 
current node is unspecified then recurse down. 
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dd_simple_tests := func(pgm, si, expl, s2, exp2, dirs, dir.count); 
constant_dd_test := func( pgm, si, expl, s2, exp2, dirs ); 

gcd_dd_test := func( pgm, si, expl, s2, exp2, dirs ); 

Examine the Qsimple.dd.tests options and invoke each routine in sequence until one 
of the tests either returns true or false. 

dd.complex.tests := func(pgm, si, expl, s2, exp2, dir, dir_count); 
infinity .dd.test := func( pgm, si, expl, s2, exp2, dir ); 
banerjee.dd.test := func( pgm, si, expl, s2, exp2, dir ); 
trapazoid.dd.test := func( pgm, si, expl, s2, exp2, dir ); 
int_prog_dd_test := func( pgm, si, expl, s2, exp2, dir ); 

display _dd_result := func( name, expl, exp2, dir ); 

Examine the Qcomplex.dd.tests options and invoke each routine in sequence until 
one of the tests either returns true or false. 

jayasimha.test := func( pgm, si, expl, s2, exp2, dirs ); 

This test is to locate subscript pairs in which all coefficients are integral, but some are 
different. 

exact_subscript_test := func(pgm, si, expl, s2, exp2, directions); 

Invoke the exact linear programming dependence test 

trapazoid_direction_test := func(pgm, si, exl, s2, ex2, dirs); 
trapazoid Jfunction.bounds := func(pgm, si, exl); 

Find the function bounds for expression ‘exl’ at statements ‘si’. The value OM is 
returned upon error. 

banerjee.trapazoid := func(pgm, si, exl, s2, ex2); 

Determine the function bounds for two expressions ‘exl’ and ‘ex2’ in statements ‘si’ 
and ‘s2\ If the constant term noes not lie between the computed bound then no 
dependence is possible. 

banerjee.quadrant := func(pgm, si, exl, s2, ex2, directions); 

Banerjee.quadrant is called if both subscript expressions are linear functions of induc- 
tion variables. This routine will only work properly when all coeficients are integers, 
and the loop has been normalized. 

infinity.test := func(pgm, si, exl, s2, ex2, directions); 

The infinity.test is called if both subscript expressions are linear functions of induction 
variables. This routine will only work properly when all coeficients are integers, and 
the step on the loops are all positive. 

banerjee_inequality := func(pgm, si, exl, s2, ex2, directions); 

Banerjee .inequality is called if both subscript expressions are linear functions of induc- 
tion variables. This routine will only work properly when all coeficients are integers, 
and all loop limits are integers. It also assumes that the loop has been normalized and 
has a step of 1. 
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In posing the integer programming problem, unknowns which are invariant in some 
surrounding loops are symbolically cancelled. The function cancel_common_tenns (exl , 
ex2, vars) eliminates variables that are additive in both subscript expressions (exl , ex2) 
and also belong to the invariant set (vars) of a given loop. The particular loop to use is 
the outermost one in which the conflicting references may occur in different iterations. 

select -invariant := func( pgm, common, dir ); 

select Jnvariant_dirs := func( pgm, common, dirs ); 

Select the set of invariant variables based on the current direction vector. Given a set 
of direction vectors, return the most restrictive set of invariant variables. 

cancel_common_terms := func( exl, ex2, invar ); 

Given two expressions ‘exl’ and ‘ex2’, return a tuple of the expressions with all common 
additive terms in ‘invar’ canceled. 

unknown-test := func(pgm, si, expl, s2, exp2, dirs, dir.count); 

Test the subscript pair to see if any direction vector in ‘dirs’ supports a large enough 
invariant set to break an ‘unknown’ test result. 

symbolic _lower_bound := func( pgm, s, id, indx ); 
symbolic_upper_bound := func( pgm, s, id, indx ); 

Given Doloop ‘s’ in ‘pgm’ and the indices ‘indx’ of the enclosing DOloops, return a map 
representing a linear function of indx that computes either the lower/upper bound of 
s, or, if ‘id’ is set, id - bound. If the function is nonlinear and the Qloop.inf o contains 
a guess of s’s bound, use the guess in place of the actual bound. 

A. 3 Statement Manipulation 
list_stmts := func(pgm); 

Return a list of all of the statements in the program 
stmts.of_type := func(pgm,st); 

Return a tuple (ordered by the @next links) of all statements of the given types; ‘st’ 
can be either a string or a set of strings. 

reorder-statements := func(pgm, start-block, end-block, new .order); 

Given a range of statements and a new order, rearrange the @next and @prev links to 
put them in that order. Don’t update the flow graph or Qouter fields. 

connect_two-statements := func(pgm, first-stmt, secondjstmt); 

Set the Qnext and Qprev fields of two statements to point to each other. 

delete_stmt := func(pgm,s); . 

Disconnect a statement from the lexical and flow graphs of the program. The statement 
remains in the statement table; its (invalid) links are unchanged. This is not sufficient 
to delete pieces of a compound construct (DO, IF, GOTO/LABEL, . . .) 

add_after.stmt := func(pgm, preceding-stmt, new_stmt); 

Insert a statement after another one. 
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make_assignment_stmt ;= func(pgm, p, q);* 

Create an assignment statement ‘p = q\ Don’t link it into the program. 

Procedure CALLs 

The following utilities are concerned with CALL statements and the various forms of entry 
points into fortran compilation units. The most significant capability of Delta with 
respect to interprocedural analysis is its inline expansion of subroutine calls. Given a 
program block, a subroutine, and a statement in the program block that calls the subroutine, 
the inline expander replaces the call statement with the body of the called subroutine. It 
changes the variable names used by the subroutine, so that the expanded program block 
is functionally equivalent to the origrinal program block. This facility is expected to prove 
useful in parallelizing loops where the presence of a subroutine call inhibits parallelization. 
At a higher level of abstraction, it eliminates some of the need for interprocedural analysis 
by destroying subroutine calls. 

make_call_stmt ;= func(pgm, name, p); 

Create a subroutine call statement, where p is a node for the parameters. Don’t 
link it into the program. 

call-stmts func(pgm); 

Builds a map from routine names onto non-empty sets of statements where they are 
CALLed. 

function-calls := func(pgm); 

Find all function calls and return a mapping pointing to the statements wherein they 
occur. 

inline_expand := func(pgml, call_stmt, pgm2, called_xoutine); 

Expand the subroutine CALL at calljstmt in pgml using the body of the called_routine 
(a named entry point) from pgm2. 

routine-name := func(pgm); 

Find the routine name of the lexically first ENTRY point, if any if none, return OM. 
Let the caller convert it to ‘MAIN’ or whatever! 

entry-points := func(pgm); 

Find all ENTRY points of a given program. Return a map from routine name to 
ENTRY statement tag for that name. 

float-entry .points := func(pgm); 

Move all ENTRY statements to the top of the routine. Follow each one with a 
GOTO/LABEL pair to its original location. This produces a unique empty section of 
code following each entry point. Lexical ordering of ENTRYs is preserved, therefore 
the routine name is not changed. 

move JEN TRY .after := func(pgm, dest, entry); 

Move a statement of type ENTRY to after dest. 
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Branching and Conditionals 
make_goto_stmt := func(pgm, st); 

Create an UNCONDITIONAL.GOTO statement that branches to statement st. Don’t 
link it into the program. 

makeJabel_stmt := func(pgm, lb); 

Create a LABEL statement with label lb. Don’t link it into the program. 

numericJabels_used_in := func(pgm); 

Find all numeric labels used in pgm. 

delete^triviaLgotos := func (pgm); 

Remove UNCONDITIONAL.GOTOs that go to the next statement. Remove the 
LABEL there too if possible. 

remove^gotos := func(pgm); 

Change any eligible IF-GOTO statements in the program to block IFs. Currently 
using an ad hoc approach; future home of full flow normalization Amm 90 ]. 

convert^arithmeticJf := func(pgm, stmts, s); 

Try to convert an arithmetic IF into an IF-GOTO-ENDTHEN-ENDIF. This is possible 
if two of the target labels are the same, and if one of the targets is the following 
statement. Stmts is the ©statements section of pgm separated out for convenience. 
Return the modified program and statements and the FIRST statement in the resulting 
sequence of statements after converting the IF if successful. Otherwise, just return the 
arguments unchanged. 

invert Jf.condition := func(pgm,ifjstmt); 

Invert the condition of the IF statement in the indicated statement. 

splitJELSEIF := func (pgm, old_elseif); 

Convert an ELSE IF clause into the ELSE clause of a new outer IF statement. 

delete_IF := func( pgm, IFjstmt); 

Remove an IF/ELSE/ENDIF structure. Connect the THEN and ELSE clauses as 
in-line code. Caller is responsible for preserving program semantics! 

DO Loops 

list_do_loops := func(pgm); 

Return a tuple of [do-start, do-end] pairs 

display _do_loops := func(pgm,extra_stuff); 

Write out the DO and ENDDO statements in the program 

list _loop_body func(pgm,dojstmt); 

Return a tuple listing the statements in the given DO loop 
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enclosingJoops := func(pgm,s); 

Build a tuple giving the loops enclosing the given statement, with the outermost loop 
first and innermost last. Return an empty tuple if there are no loops around this 
statement. 

commonJoops := func(pgm,sl,s2); 

Return a tuple of all DO loops that enclose both statements 

loop Jndex := func(pgm,l); 

Return the name of the index variable for the given loop, 
delete-loop := func(pgm,dojstmt); 

Remove a DO loop from around a set of statements. Reconnect the lexical and 
flow graphs of the program. The DO and ENDDO statements are deleted from the 
©statement and ©loop_info tables. The ©outer fields are updated. 

make-doJoop := func (pgm, after, index, init, limit, step, before); 

Make a new DO/ENDDO statement pair. Put the DO after ‘after’, giving it the index 
variable ‘index’ and bounds formed from the args_trees ‘init’, ‘limit’, and ‘step’. Put 
the ENDDO before ‘before’. Statements between ‘after’ and ‘before’ (if any) form the 
body of the loop. Their ©outer field is updated, but their ©loop_info is not (yet) 
affected. 

initializeJocaLvariable := func(pgm, s, v, expr); 

Create a statement or loop to initialize a local variable 

A. 4 Preconditioning 

Control Flow 

control_flow-graph := func( pgm, stmt Jist ); 

Return a control flow graph for the list of statements contained in stmt Jist. The value 
is a mapping of statements onto a set of arcs. 

dominates := func( idom, si, s2 ); 

Return true iff ‘si’ dominates ‘s2’. 

all_immediate_dominators := func( pgm ); 
all Jmmediate_postdominators := func( pgm ); 

Returns a mapping from every statement in the program to its immediate (post)dom- 
inator statement. 

loop_immediate_dominators := func( pgm, doloop ); 
loop Jmmediate_postdominators := func( pgm, doloop ); 

Returns a mapping from every statement in doloop to its immediate (post)dominator 
statement. 

immediate.dominators := func( stmtJist, CFG, R-CFG ); 
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immediate.postdominators := func( stmt Jist, CFG, R.CFG ); 

Returns a mapping from statements in ‘stmt -list’ to their immediate dominator state- 
ment: the closest statement that appears on every path from the entry to the state- 
ment. Tarjan’s flow dominator algorithm is used [LT79], 

In the map, idom(first stmt Jist) = first stmt Jist is added to make the resulting map 
everywhere defined. 

dominance-frontier := func( idom, root, CFG ); 
control-dependence := func( pgm, stmt Jist ); 

Given an immediate dominator map and the ‘root’ of its corresponding control flow 
graph, (or a ‘pgm’ from which a postd ominator map can be generated for a particular 
‘stmt Jist’) this returns a map from each statement X to all statements Y such that X 
(post)dominates a (successor) predecessor of Y but does not strictly (post)dominate 
Y. This algorithm forms part of PTRAN’s Static Single Assignment Form conversion 
described in [CFR+88]. 

invert-graph := func( graph ); 

Invert a graph represented as a mapping of items onto sets of items, 

strongly -connected-components := func(stmts, graph, 1); 

Given a set of statements and a graph connecting them, return the strongly-connected 
components as a set of disjoint subsets of ‘stmts’. For now ‘graph’ is represented as 
a set of ordered pairs representing the directed edges between statements. Tarjan’s 
algorithm [Eve] is used. 

build_pi -blocks := func(pgm, 1); 

Build the strongly-connected components (aka pi-blocks) for loop T. 
topologically_sort func(nodes, edges); 

Given node and edge sets of a graph, return a tuple of nodes in an arbitrary total 
order which satisfies the partial ordering induced by the edge relation. The edge set 
must not have any cycles involving members of ‘nodes’. 

build^all.prior^sets := func(pgm); 

Add prior sets as Qprior to each DO statement in the program 

new-prior func(pgm, s 1,82,1); 

Do the prior test for two statements with respect to a loop. i.e. return true if there 
is a flow path from the first statement to the second within a single iteration of the 
loop. Assume that the prior sets are attached to the loop header. 

Reducing irrelevance 
dead_code_elimination := func(pgm); 

Eliminate dead (i.e. unused) assignment statements in ‘pgm’. Remove IFs and DOs 
that become empty along the way. 
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delete JELSEIF := func(pgm,ELSEIF_stmt); 

Deadcode removal of ENDTHEN / ELSEIF pairs. Assumes ELSEIF does not con- 
trol any meaningful statements. Does not remove any ELSE parts of the ELSEIF 
construction. Semantically too wierd for any use besides deadcode elimination. 

unreachable_code_elimination := func(pgm); 

Delete statements that cannot be reached from the ® initial -statement via Qsuc- 
cessor links and that are not needed for the lexical integrity of the program. Also 
delete statements disconnected from the lexical graph. Assumes that the reachable set 
is a subset of the statements connected by ©next and Qprev links in the lexical graph. 
Assumes (ansi f 77) that there are no branches into DO loops, or into IF, ELSE, or 
ELSEIF blocks. 

Induction 

induction_variables := func(pgm, doloop); 

Find induction variables in a loop. Replace uses of the variable with the equivalent 
value expressed as a tuple of expressions, where the first element is the constant term, 
the second the increment due to the inner loop, the next the increment due to the 
next inner loop, etc. 

test_for ..increment := func(pgm,s,loop invariants); 

See if the given statement is an increment. If it is, return a tuple with the the Ihs 
variable name and the set of rhs terms other than the lhs variable. Return OM if not 
an increment. 

not_always_executed := func(pgm, stmt, loop_body); 

Return TRUE if the given statement isn’t executed exactly once every time through 
the given loop. 

substituteJnduction.variables := func(pgm, 1); 

Given a set of induction variable assignments for a loop, substitute uses of the variables 
in loop 1 with the equivalent value in terms of the (normalized) loop index variable. 
Return the modified program and a count of variables removed. 

substitute-all Jnduction_variables := func(pgm); 

Call substitute_induction_variables on each loop of the program. 

induction-loop Jnfo func( pgm, si ); 

Annotate the ©loop Jnfo database with the induction variables for loop 4 sl\ 

Invariance 

invariant -expression := func(pgm, e, invariants); 

Return true if all of the names that appear in the given expression are in the given 
set. The expression is then invariant iff all the function calls and intrinsic calls are 
deterministic (checks for this not included). 

loop-invariant .variables := func(pgm,l); 

Return a set of variable names that are not modified in the given loop 
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invariant Joop _info := func( pgm, si ); 

Annotate the ©loopinfo database with the invariant variables for loop ‘si’. 

Canonical iteration space 
normalizeJoop := func(pgm, doloop); 

To normalize a loop, subtract the lower-bound from the upper and lower bounds, 
rename the loop index variable to a new name, and subst all of the occurences of the 
loop index variable by the new formula. Locate all exits from the loop, and restore 
the index value. 

rename.do_loop_index := func(pgm, doloop, index, repl); 

Find all exit paths from loop ‘doloop’ and make sure that the line index = new .value 
is added to each. 

normalize_all Joops func(pgm); 

Normalize all of the loops in the program 

Forward substitution 

propagate.constants := func(pgm); 
ok-to-substitute := func(pgm, e, OK_names); 

Do forward substitution of scalar variables throughout pgm. Attach a subtree to each 
substitutable expression giving either an equivalent expression tree (in ©substituted) 
or a set of possible constant values (in ©possible-values). Expressions are. substituted 
only if they are free of function calls and array references. 

evaluate_scalar_assignment := func(pgm,expr, stmt JN, DEF.values); 

Try to evaluate the expression given the scalar definitions that reach it. If a non- 
constant value reaches any of the rhs exprs, return OM, otherwise return the set of 
possible values to be attached to the statement. Return an empty set if evaluation 
will always be impossible. DEF_values is a mapping of DEFs to the set of values for 
that def. 

Alternative: use a simple evaluator that takes a binding environment as parameter, 
collect possible values from the evaluation in all possible binding combinations for the 
expression’s input variables. 

evaluate_integer_expression := func(pgm,e); 

Try to evaluate an expression to an integer value. Return OM if it is impossible. 

clean_expressions := func(pgm); 

Remove the propagated constants and expressions from pgm. 

Scalar expansion 
localizable_scalars := func(pgm,l); 

Compute the set of scalars that could be localized to the given loop. Assumes that 
in_refs and out -refs are correct. 
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localizable_loop_info := func( pgm, si ); 

Annotate the Qloop_info database with the localizable variables for loop *sl\ 

expand-scalars := func(pgm, names, enclosing-loops); 

Expand each of a set of scalars into arrays. Add one dimension for each loop in the 
enclosingJoops tuple (which is ordered with the outermost loop to be expanded first). 
Dimension it as (*,*, (fill in the bounds later). Assume that the loops are 

normalized, and use the inner loop as the leftmost subscript. 

A. 5 Restructuring 
Loop interchange 

permutation_to_swaps := func(permutation); 

Given a tuple representing a permutation, e.g. [3,1,2], return a tuple of ordered pairs 
giving exchanges of adjacent elements to make in order to change [1,2,3] into that 
permutation. For [3,1,2], one possible sequence of exchanges is [[2,3] , [1,2]]. 
The strategy taken is to swap the element that goes to the last position into place (i.e. 
3 in the example is swapped with 2, then 2 is swapped with 1 to bring it to its final 
position), then repeat as necessary for the second-to-last, ...Note that this is o(n 2 ), 
like a bubble sort (which it strongly resembles). 

permuteJoops := func(pgm, old-order, new.order); 

Given an old outside-in order for a set of perfectly nested loops and a new order, return 
a copy of ‘pgm’ in which the loop nest is permuted into the new order. 

interchange Joops := func(pgm,sl,s2); 

Return a copy of ‘pgm’ in which the perfectly-nested loops ‘si’ and ‘s2’ are inter- 
changed. 

legal_to_interchange := func(pgm,outer_do, inner-do); 

Test whether interchanging the given loops would violate a data or control dependence. 

interchange := func(pgm, 11, 12); 

Cover function for interchange Joops that first checks to see if it is legal to interchange 
the loops. 

Loop distribution 

distribute Joop := func(pgm,l, break-after); 

Break a loop T in two after one of its statements ‘break-after’. 

smashJoop := func(pgm,doloop); 

Build the piblocks for a loop, then use them to smash the loops into tiny pieces. 
Start by smashing the inner loops recursively (therefore this is inside-out distribution). 
Return the modified program and the last resulting ENDDO. 
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Stripmining[Lov77] 

strip_vertical-inner := func(pgm, DOJoop, stripsize); 
strip_vertical_outer := func(pgm, DOJoop, stripsize); 
strip_horizontalJnner := func(pgm, DOJoop, stripsize); 
strip_horizontal_outer := func(pgm, DOJoop, stripsize); 
stripmine := func(pgm, DOJoop, length); 

Return a version of ‘pgm’ and the statement tags of two perfectly nested loops which 
together control the iteration space of the given ‘do Jooph The iteration space is split 
into strips of a positive integer size computable by the expression ‘stripsize 5 (or given 
by the constant ‘length’). The bounds expressions of ‘DoJoop’ should be invariant in 
the loop. 

This technique is in widespread use by vectorizers, concurrentizers, and ordinary compilers 
that attempt to improve data locality. For example in strip-vertical Jnner, the inner loop 
carries the original loop’s index variable over ‘strips’ of a given or computable size using 
the same stride as the original loop. The new outer loop schedules enough strips to cover 
the original iteration space. 

Vectorizing and Concurrentizing 
tiny_vectorizer : = func(pgm); 

Return a version of ‘pgm’ in which loops are marked as DOALL or sum based on a 
small amount of dependence pattern recognition. 

trivialjsubscript -test := func(pgm); 

Look at all of the dependences in Qloop_inf o [pgm] . Eliminate any which have: A ‘j’ 
in some direction vector position. A source and sink subscripted by the index for the 
corresponding loop. Return the reduced dependence graph. 

trivially .parallel := func(pgm,l); 

Return true if there are no non-‘=’ dependence arcs in the graph for this loop, 
trivial .summation := func(pgm,l); 

See if this loop is a sum of trivial form, e.g. scalar = scalar + ‘invariant stuff’ 

Instrumentation 

instrument^all.outer Joops := func( pgm ); 

Return a copy of ‘pgm’ in which each outermost loop is bracketted by a pair of sub- 
routine calls that can collect and identify timings of individual loop nests. 

instrument_do_loop := func( pgm, doloop ); 

Return a copy of ‘pgm’ in which the ‘doloop’ is bracketted by calls to do$entry and 
DoSexit with the statement tag of the loop as their argument. 

critical.path.program := func( pgm ); 

Return a version of ‘pgm’ instrumented so that executing it serially will measure the 
longest critical path of data or control dependence for that program run. 
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Output 

write_program := func(pgm); 
display .program := func(pgm); 
write_program_to_file func(pgm, fname); 

write_program_fd := func(pgm, allJnfo, fd); 

Write out a program unit with the given name. 

write_stmt jstring := func(string, indent, heading-width, fd); 

Write out a string as a statement with indentation level ‘indent’ and with continuation 
lines preceded by ‘heading-width’ spaces. Analyze while printing to see if continuation 
lines can be indented safely. Indent level should not include the initial 6 columns, also, 
the initial line will not have indentation or headings printed by this program; that is 
the responsibility of the caller. 

type.decls := func(pgm, fd); 
parameter.decls := func(pgm, fd); 
format-dim := func(pgm, dim); 
array -decls := func(pgm, fd); 
common_decls := func(pgm, fd); 
equivalence_decls func(pgm, fd); 
data-decls := func(pgm, fd); 
save_decls := func(pgm, fd); 
external.decls := func(pgm, fd); 

Produce the declarative statements that establish the evironment for ‘pgm’s execution. 
FORTRAN77 output syntax only! 

write_stmt := func(pgm, stmt, allJnfo, marking); 
write_stmt-fd := func(pgm, stmt, allJnfo, marking, fd); 

Write out executable statement ‘stmt’. Return the next statement, or OM if none. 
Print extra stuff if ‘allJnfo’ is set, prepend ‘marking’. 

write_all-stmts := func(pgm, initial, final, allJnfo, fd); 

Write out all the statements in the given set that are in the chain starting with ‘initial’, 
connected by ©next, and ending with ‘final’. 

write_do_loop := func(pgm, do_stmt); 
write_do_loop_fd := func(pgm, dojstmt, fd); 

Write out one do loop. 

statement JabeLvalue := func(pgm,s); 

Return a string form of the numeric statement label for statement ‘s’. 

write_format-stmts := func(pgm, fd); 

Write out any format statements in the program. 

format_expr := funcfpgm, e, give.values); 
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write_expr := func(pgm, e, givejvalues); 
writeln.expr func(pgm, e, give.values); 

Convert an expression tree into a string for output. Print out an expression tree. Write 
out an expression labelled by its number followed by a newline. 

new jdentifier := func(pgm, id); 
create_identifier := func(pgm, base-id, id-type); 
create_simple_variable := func( pgm, name, type, size ); 
new_critical_identifier := func(pgm, id); 
makeJabel := func(prefix); 

Serve different needs for the production of new names for compiler generated compo- 
nents of the program. 

A.6 SETL Utilities 

integer_to_string := func(n); 

Return the character string representation of the integer ‘n\ 

string-to_integer := func(s); 

Return the character string representation of the integer ‘n\ 

misc-to-string := func(int_orjstring); 

Given a parameter that is an integer or a string, return the string equivalent. 

pad-string := func(s, length); 
left-pad-string := func(s, length); 

Given a string and a minimum length, pad it with blanks to that length if necessary. 

compact .object := func(obj); 

Returns its argument with structure sharing among leaf nodes. 

negp := func( x ); 
posp := func( x ); 

Return the negative/positive part of an integer. 

iNEG := func( a ); 
iADD := func( a, b ); 
iSUB := func( a, b ); 
iMUL := func( a, b ); 

Operate on the arguments taking care of infinities and OM. 

gcd := funcf a, b ); 
gcdn := func( list ); 

Compute the GCD of a pair (list) of integers. 

divides := func( a, b ); 

Determine if a divides b. 
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between := func(a, b, c); 

Determine if the relation [ a ;= b j= c ] is true. 

tuple_reverse := func( x ); 

Reverse the order of elements in a tuple, 

tupleJndex := func( x, T ); 

Return the index of the first occurence of 4 x’ in tuple ; T’, or OM if none. 

laminate := func(listl, list2); 

Laminate two tuples into a tuple of tuples. 

split.tuple := func( tpl, test ); 

Split a tuple into two parts, one which passes the test and the other not. 

list_strings := func(separator, strings); 

Concatenate a set or tuple of strings, separate them with ‘separator’. 

commas_between := func(list); 

Concatenate strings in list, separating them by ‘, \ 


limit.tupleJength := func( list, limit ); 

Given a tuple of strings return a tuple of tuples which are all less than the limit when 
separated by commas. 
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